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Structured  Knowledge  Space 


A  multifaceted  software  system 
enables  increased  exploitation 
of  a  vast  store  of  intelligence 
and  military  reporting. 

Structured  Knowledge  Space  (SKS)  is  an 
end-to-end  software  system  developed 
to  solve  a  problem  that  has  frustrated 
national  security  decision  makers:  “How 
do  we  take  advantage  of  the  enormous 
amounts  of  information  communicated 
daily  through  a  wide  variety  of  reporting 
venues?”  Various  factors  make  it  difficult 
for  decision  makers  to  search  and  correlate 
the  wealth  of  information  contained  in 
these  reports: 

•  Documents  are  often  stored  in  Micro¬ 
soft  PowerPoint,  Adobe  PDF,  or  other 
formats  not  well  suited  to  search  or  to 
computer-based  analysis. 

•  Reports  are  often  disseminated  via  email 
or  other  ad  hoc  channels,  further  hin¬ 
dering  search  and  discovery  of  critical 
battlefield  or  intelligence  information. 

•  The  number  and  variety  of  organizations 
involved  leads  to  significant  volume  and 
velocity  of  data  lacking  a  coordinated 
indexing  system. 

•  Although  documents  vary  greatly,  from 
brief  daily  updates  to  lengthy  analyses, 
they  commonly  use  domain-specific 
jargon  and  abbreviations;  “boilerplate” 
text,  such  as  headers  and  disclaimers, 
provide  no  new  information  but  clog  the 
search  process. 

SKS  combines  open-source  technolo¬ 
gies  (e.g.,  Java  and  Lucene),  custom-built 
software,  and  domain  knowledge 
about  important  entities  in  intelligence 
reporting  to  create  a  robust  system  that 
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Structured  Knowledge  Space  (SKS)  creates  structured  metadata  (essentially  data  about  other 
data)  to  improve  the  discovery  and  use  of  unstructured  reports,  i.e.,  reports  such  as  Word  docu¬ 
ments  or  email  that  are  not  organized  in  a  predefined  model  such  as  a  database  or  table. 


facilitates  searching  over  a  document  col¬ 
lection  that  had  previously  been  largely 
unsearchable.  SKS  builds  searchable 
archives  of  text-based  intelligence  reports, 
extracts  information  from  free-form 
documents,  and  makes  the  information 
discoverable  through  a  keyword  and  fac¬ 
eted-search  interface.  SKS’s  tools  include 
ones  that  search  for  approximate  name 
matches  or  geographic  locations  refer¬ 
enced  in  text.  SKS’s  modern  tiered  archi¬ 
tecture  scales  to  significant  data  storage 
and  retrieval  demands. 

SKS  exploits  modern  natural  language 
processing  and  information  retrieval  tech¬ 
niques  to  improve  the  ability  to  search, 
analyze,  and  effectively  utilize  intelligence 
reports  and  the  valuable  information  that 
they  contain.  Its  functionality  is  similar  to 


niche  capabilities  in  other  industries,  e.g., 
Google  News  for  aggregating  news  sources 
and  Radian6  for  social  media  analysis. 
However,  SKS  was  designed  to  meet  the 
specific  needs  of  the  military  and  intelli¬ 
gence  communities. 

Capabilities  of  SKS 

SKS  started  as  an  R&D  effort  and  has 
since  been  productized  and  fully  integrated 
into  several  customer  information  process¬ 
ing  and  dissemination  chains.  SKS’s  fea¬ 
tures  increase  users’  capability  to  exploit 
the  knowledge  captured  in  the  multitude 
of  intelligence  and  operational  documents 
generated  and  filed  each  day: 

•  Users  can  query  for  approximate  name 
matches  or  geographic  locations  refer¬ 
enced  in  documents. 
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The  Structured  Knowledge  Space  search  page  provides  diverse,  useful  information. 


•  Special  features  deal  with  transliterated 
Arabic  names,  which  present  challenges 
because  of  the  inconsistent  spellings  that 
arise  when  Arabic  characters  are  repre¬ 
sented  with  English  letters.  This  capabil¬ 
ity  was  driven  by  specific  user  needs  that 
made  more  general-purpose  commercial 
software  less  useful. 

•  SKS  includes  features  for  data  browsing 
and  trend  analysis. 

SKS’s  functionality  relies  on  its  ability 
to  efficiently  and  accurately  recognize  and 
extract  entities  from  documents.  An  entity 
is  the  textual  representation  of  a  person’s 
name,  possibly  including  military  rank;  an 
organization’s  name;  a  place  name  (city, 
region,  country,  etc.);  or  specialized  entities 
such  as  date-time  groups  (a  common  way 
of  representing  dates  and  times  in  the  U.S. 
military)  and  geospatial  coordinates.  SKS 
employs  rules  and  dictionaries  to  enable 
discovery  and  extraction  of  such  entities. 
Several  of  the  rule-based  extractors  are  quite 
complex,  so  in  order  to  make  them  more 
computationally  efficient,  they  are  imple¬ 
mented  as  tries,  i.e.,  tree  data  structures  for 
efficient  retrieval  of  words  and  phrases. 

Extracted  entities  are  indexed  to  enable 
efficient  search  and  discovery.  SKS  cre¬ 
ates  structured  metadata  (essentially  data 
about  other  data,  e.g.,  source  of  the  data, 
date  the  data  were  collected,  size  of  a  data 
file)  to  improve  indexing.  This  indexing 
is  also  based  on  various  similarity  scores, 
thus  allowing  users  to  search  by  exact 
match  or  to  search  for  documents  similar 
to  ones  already  discovered.  SKS  is  capable 
of  searching  for  documents  containing  a 
geospatial  coordinate  or  time  reference 
within  a  specified  geospatial  or  temporal 
region  of  interest.  The  system  also  pro¬ 
vides  a  reverse  gazetteer,  which  describes 
where  extracted  geospatial  coordinates  are 
located  relative  to  named  locations.  SKS 
can  also  search  by  ingest  date  and  by  word 
or  phrase  trends,  thus  improving  analysts’ 
ability  to  connect  related  information. 

Data  discovery  and  extraction  by  SKS 
go  beyond  the  individual-document  level 
by  offering  capabilities  for  summarizing 
data  holdings  at  the  result  set  and  cor¬ 
pus-wide  levels.  While  some  systems  can 
show  counts  of  entities  or  phrases  across 
multiple  documents,  SKS  provides  ana¬ 
lysts  with  summaries  of  key  topics  across 


the  whole  corpus.  Thus,  SKS  enables  users 
to  view  data  at  the  level  of  detail  appropri¬ 
ate  for  their  current  task. 

SKS  includes  techniques  for  cluster¬ 
ing  documents  into  groups  with  similar 
content.  This  capability  allows  users  to 
rapidly  scan  topics  available  in  a  document 
collection  to  help  them  find  the  subset  of 
most  interest.  SKS’s  flexible  mechanisms 
for  ingesting  documents  include  an  upload 
web  page  and  the  ability  to  monitor  email 
accounts  and  directories.  Because  of  this 
flexibility,  SKS  can  be  used  as  a  gener¬ 
al-purpose  document  repository  and  dis¬ 
covery  tool,  e.g.,  on  a  company  intranet. 

Benefits  of  SKS 

SKS’s  features  increase  military  and  intel¬ 
ligence  analysts’  ability  to  make  use  of  the 
large  collection  of  documents  generated 
each  day.  As  an  illustration  of  the  scope  of 
data  SKS  can  handle,  a  feed  of  information 
from  the  Open  Source  Center  generated 
approximately  3000  new  documents  per 
day  for  an  SKS  development  system. 

SKS  offers  a  service  that  did  not 
previously  exist.  SKS  can  perform  docu¬ 
ment-clustering  that  reveals  connections 
that  may  be  extremely  useful  to  analysts  by 

•  Finding  all  documents  referring  to  an 
organization  (even  when  the  organi¬ 
zation  has  several  aliases  and/or  name 
variations) 

•  Finding  all  documents  referring  to  a 
particular  person  (even  when  the  per¬ 


son  has  several  aliases  and/or  name 
transliterations) 

•  Finding  all  documents  with  a  geospatial 
reference  within  a  certain  distance  of  a 
location 

•  Finding  all  documents  with  a  time  refer¬ 
ence  within  a  specified  date  range 

SKS  is  providing  a  much  needed  capa¬ 
bility  in  the  national  security  domain.  The 
current  users  are  primarily  the  military 
and  intelligence  communities;  however, 
other  communities,  such  as  law  enforce¬ 
ment  or  border  protection,  may  find  use 
for  information  gleaned  from  the  report¬ 
ing.  The  near-term  road  map  for  SKS 
includes  increasing  the  sophistication  of 
its  text-mining  algorithms  and  providing 
early  demonstrations  of  unstructured  data 
processing  on  the  Department  of  Defense’s 
emerging  cloud  platforms.  ■ 
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